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Thermodynamics  governing  the  synthesis  of  DNA  and  RNA  strands  under  a  template  is  considered 
analytically  and  applied  to  the  population  dynamics  of  competing  replicators.  We  find  a  nonequilibrium 
phase  transition  for  high  values  of  polymerase  fidelity  in  a  single  replicator,  where  the  two  phases 
correspond  to  stationary  states  with  higher  elongation  velocity  and  lower  error  rate  than  the  other.  At  the 
critical  point,  the  susceptibility  linking  velocity  to  thermodynamic  force  diverges.  The  overall  behavior 
closely  resembles  the  liquid-vapor  phase  transition  in  equilibrium.  For  a  population  of  self-replicating 
macromolecules,  Eigen’s  error  catastrophe  transition  precedes  this  thermodynamic  phase  transition  during 
starvation.  For  a  given  thermodynamic  force,  the  fitness  of  replicators  increases  with  increasing  polymer¬ 
ase  fidelity  above  a  threshold. 
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The  replication  of  genes  by  biological  organisms  lies  at 
the  core  of  their  activities,  seemingly  driving  competitions 
for  survival  [1],  The  process  on  the  molecular  level  is 
driven  by  enzymes  (polymerases)  catalyzing  the  growth 
of  a  DNA  primer  strand  (the  nascent  chain  of  nucleotides 
complementary  to  the  template  strand)  based  on  the 
Watson-Crick  base  pairing  rules  (A  is  paired  with  T  and 
G  with  C).  Because  of  their  movement  along  the  chain, 
polymerases  belong  to  molecular  motors  converting 
chemical  free  energy  into  work  [2,3].  The  copying  process, 
however,  involves  errors,  whose  rate  varies  with  the  repli¬ 
cation  conditions  as  has  been  observed  recently  [4-7].  This 
thermodynamic  interpretation  of  replication  may  be  rele¬ 
vant  to  many  biological  phenomena:  mutation  rates  of 
bacteria  increase  under  stress  such  as  starvation  [8],  RNA 
viruses  maintain  their  mutation  rates  near  the  threshold 
of  error  catastrophe,  beyond  which  populations  cannot 
sustain  stable  genomes  [9,10]. 

Compared  to  a  polymerase-nucleotide  strand  complex 
(“replicator”)  far  from  equilibrium  primarily  driven  by 
external  chemical  potentials,  a  replicator  near  equilibrium 
copies  its  genome  with  higher  error  rates,  which  make 
extra  contributions  to  the  entropy  production  [4],  In  this 
Letter,  we  show  that  the  transition  between  these 
externally  driven  and  disorder-driven  regimes  of  replica¬ 
tion  exhibits  all  characteristic  features  of  phase  transitions 
one  normally  expects  in  equilibrium.  Many  systems  driven 
out  of  equilibrium  exhibit  phase  transitionlike  behavior, 
including  models  with  absorbing  states  [11]  and  the  evo¬ 
lution  of  opinions  in  social  networks  [12].  The  features 
observed  here,  however,  are  unique  in  the  sense  that  they 
offer  a  close  nonequilibrium  analog  of  the  liquid-vapor 
transition.  We  combine  this  single  molecule  thermodynam¬ 
ics  with  the  population  dynamics  of  replicators  [9],  which 
provides  a  thermodynamic  interpretation  of  molecular 
evolution. 


Considering  first  macroscopic  thermodynamics  of  a 
single  replicator  in  a  reservoir,  the  chemical  reaction 
involved  is 

(DNA)*,  +  M  (DNA)jv+1  +  P,  (1) 

where  (DNA)*,  is  the  primer  strand  of  length  N,  M  denotes 
one  of  four  nucleoside  triphosphate  (NTP)  monomers,  and 
P  is  pyrophosphate  (PPi).  The  entropy  of  the  system  plus 
surrounding  S  =  S(Nm,  N„,  N )  is  a  function  of  the  number 
of  free  monomers  Nm,  the  number  of  PPi  N„,  and  the  chain 
length  N  in  units  of  monomer  size.  Their  changes  are 
related  by  dNm  =  ~dNp  =  —dN.  The  entropy  change 
(or  production)  can  be  written  as  [13] 

dS  =  -^fdNm  -  ^dNp  +  f-dN  =  f~^GdN,  (2) 

where  //,,■  =  —TdS/dNj  are  the  chemical  potentials,  T  is 
temperature,  /  =  TBS/dN  is  the  external  force  acting 
on  the  primer  strand,  and  AG  =  ptp  —  ptm  =  AG°  — 
7Tn([M]/[P])  is  the  Gibbs  energy  change  of  reaction  (1) 
(Boltzmann  constant  is  1).  The  entropy  production  rate 
S  can  thus  be  written  as 

S  =  Fv,  (3) 

where  v  =  dN / dt  is  the  elongation  velocity  and  the  ther¬ 
modynamic  force  F  imposed  by  the  reservoir  is  given  by 


The  two  terms  in  Eq.  (4)  represent  mechanical  and 
chemical  driving  forces  of  the  reservoir.  We  have  assumed 
that  the  reaction  and  movements  are  tightly  coupled: 
Eqs.  (3)  and  (4)  correspond  to  a  special  case  of  a  more 
general  expression  for  molecular  motors  [Eq.  (7)  of 
Ref.  [2]  with  reaction  rate  equal  to  u].  Forces  acting  on 
nucleotide  strands  or  polymerase  can  be  controlled  with 
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optical  tweezers  [14],  and  we  assumed  that  the  direction 
of  /  coincides  with  that  of  elongation. 

Stochastic  dynamics  allows  us  to  relate  this  thermody¬ 
namic  description  to  molecular  features  [2,4,5]  and  derive 
the  constitutive  relation  between  F  and  v.  The  attachment 
(detachment)  rates  of  monomer  a  base-paired  to  (3  are 
denoted  k+(a\(3),  where  a  and  (3  are  any  of  m  possible 
monomer  types  [m  =  4  for  DNA  (RNA)].  The  conditional 
probability  Pn(an,  t\(3)  of  finding  the  chain  of  length  n  at 
time  t  with  sequence  a "  =  (an,  an-lt . . a{)  under  a 
given  template  sequence  (3  satisfies 


’T'  /  k+(a\j3)  \  _ 

4*u-0i/8)  +  v/p 


(12) 


which  determines  the  velocity  implicitly.  From  Eq.  (11), 
the  fraction  e  (error  rate)  of  monomers  for  which  a  A  /)t, 
where  (3 1  is  the  correct  Watson-Crick  complementary  base 
of  yS,  can  be  obtained  by 

e  =  (  Y  q{a\!3)\  .  (13) 

t  '  $ 


^PM',  t\(3)  =  t\f3) 

dt 

+  ^k-(a'\/3n+l)Pn+l(a'an,  t\(3) 

a' 

-  k-(an\(3n)Pn(an,  t\(3) 

-Yk+{a'\Pn+i)Pn(*n,t\l3).  (5) 

a' 

We  write  Pn{an,  t\f3 )  =  p„(t)q„(a"\/3 )  [4],  where  pn{t)  is 
the  probability  of  having  chain  length  n  at  time  t.  Since  the 
rate  constants  k±{a\/3 )  in  Eq.  (5)  are  all  local  functions  and 
do  not  depend  on  neighboring  nucleotides,  we  expect 

qn(an\(3)  =  q(an\/3n)q(an^i\/3n-i)  •  •  •  q(a1 \/3l)  (6) 

in  the  stationary  state,  where  q  =  q j.  Using  Eq.  (6)  in 
Eq.  (5),  assuming  that  the  template  sequence  distribution 
is  spatially  uncorrelated,  and  averaging  both  sides  over 
/3„  +  1,  we  can  write 

C^-q{a\(3)  =  k+{a\0)pn-x  +  J-q(a\f3)pn+l 

-  k-(a\/3)q(a\(3)pn  -  J+q(a\f3)pn,  (7) 

where  J+  =  X„(Mal/3)>/3,  J-  =  'Za(k-(a\P)q(c‘\(3))p 
are  the  rates  of  chain  growth  and  shrinkage. 

Summing  both  sides  of  Eq.  (7)  over  n, 

k+{a\[3)  +  [. J _  -  k-{a\(3)  -  J+]q(a\/3)  =  0,  (8) 

whereas  multiplying  both  sides  of  Eq.  (7)  by  n  and  sum¬ 
ming  over  n  using  v  =  Y., ,ndpn/dt,  we  have 

q(a\(3)v  =  k+(a\/3)  ~  J-q(a\/3).  (9) 

If  we  sum  Eq.  (9)  over  a  and  average  over  (3,  we  get 

v  =  J+-  7_,  (10) 


The  backward  rate  k_  can  be  written  in  terms  of  the 
forward  rate  k+  introducing  by 

k_(a\/3)  =  e~^Tk+{a\f3).  (14) 

The  parameter  —  gaj0  corresponds  to  the  binding  free  en¬ 
ergy  of  a  single  monomer  a  against  the  template  base  [3. 
This  free  energy,  however,  is  a  potential  of  mean  force 
for  the  given  fixed  nucleotide  pair.  Because  of  the  contri¬ 
bution  of  sequence  disorder  (see  below),  this  parameter  is 
in  general  not  equal  to  the  change  in  external  chemical 
potentials  AG.  The  entropy  production  per  monomer 
added  is  [4] 

F  =  (^ga^MP\ _  ^(a|y8)ln?(a|yS)  =  {l3*A  +  D 

(15) 

the  first  term  arising  from  the  consumption  of  monomers 
and  D  =  —  ^nq(a\P)) /3  from  the  sequence  dis¬ 

order,  the  Gibbs  entropy  of  the  uncertainty  of  a  paired  to  a 
known  f3  due  to  copying  errors.  Comparing  Eqs.  (4)  and 
(15),  we  note  that  if  D  =  0,  {gap)  =  /  —  AG:  the  average 
of  gafj  coincides  with  the  external  thermodynamic  driving 
force.  With  GAO,  however,  the  experimentally  control¬ 
lable  parameter  is  the  thermodynamic  force  F  given  by 
Eq.  (4)  rather  than  {gap).  Combining  Eqs.  (12)  and  (14), 
we  observe  that  v  =  0  when  Ya(e8“fi^T)B  =  1>  whereas  if 

gat 3  ->■  °°:  v  -*•  J+- 

To  derive  explicit  formulas  for  stationary  properties,  we 
adopt  the  following  simple  model: 

k+(a\/3)  =  8a^k  +  (1  -  8a^)yk, 
gafi/T  =  g  ~  (1  -  Saj8t)lni7,  (16) 


where  YacPa Ij8)  =  1  has  been  used.  Equations  (8)  and 
(10)  give 


q(a\(3) 


k+(a \P) 

k-{a\(3)  +  v 


(ID 


A  substitution  of  Eq.  (11)  into  the  normalization  condition 

YMa\P))t3  =  1  yields 


and  a  uniform  distribution  of  (3,  where  0  <  y  <  1  and 
r]  >  1  are  the  ratios  of  monomer  insertion  rates  k+  and 
dissociation  constants  k-  / k+  for  incorrect  to  correct  base 
pairs,  respectively,  (the  magnitude  of  binding  free  energy  is 
smaller  for  incorrect  pairs).  The  equilibrium  occurs  when 
8  =  £eq  =  —  ln[l  +  (m  —  l)/p\  Equation  (12)  gives  for 
v  =  v/k. 
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v  =  T  +  ^/r2  +  ye  s(m  —  1  +  rj  —  e  grj),  (17a) 
r  =  i[l  -  e~g  —  y  +  y(m  —  e_g77)].  (17b) 


The  error  rate  from  Eq.  (13)  in  this  case  becomes 


e  = 


m  —  1 

e~grj  +  v/y ' 


(18) 


while  Eq.  (15)  gives 


ln(l  +  egv)  ln(l  +  egv/r\y) 

- +  (m  —  1) - ; - 

e  g  +  v  e  8 rj  +  v/y 


(19) 


grown  stochastically  under  a  random  template  sequence  /3 
with  the  initial  condition  n  =  0.  The  numerical  results 
shown  in  Fig.  1  confirm  Eq.  (6). 

Equation  (19)  should  match  Eq.  (4)  in  stationary  states, 
and  Eqs.  (17)— (19)  give  the  dependence  of  v  and  eon  F 
parametrically  through  g  [Fig.  2],  which  resembles  the  p-V 
isotherms  of  van  der  Waals  gas:  below  a  critical  yc,  the 
system  exhibits  multiple  values  of  v  and  e  for  a  given  F. 
Most  polymerases  have  low  mismatch  rates  (y  ;£  10-4) 
and  are  expected  to  be  subcritical.  As  y  —> ►  0,  Eqs.  (20)  and 
(21)  give 


F  =  e0  ln[(m  -  l)(e0  1  -  1)/ rj].  (22) 


It  is  useful  to  consider  the  limit  of  perfect  selectivity 
y  — >  0.  The  values  of  v,  e,  and  F  in  this  limit  are 
L>0  =1  —  e~g  if  g  >  0  and  v  =  0  otherwise, 


fn  — 


1  -  eg 
0 


if  geq  ^  ^  0, 

if  g  —  0, 


(20) 


^)ln[(^)i]  if  £  — 

if  S'  —  0. 


(21) 


Figure  1  shows  the  dependence  of  stationary  properties  on 
g.  The  singular  behavior  of  F  versus  g  was  observed  from 
numerical  simulations  in  Ref.  [4],  which  is  revealed  here 
analytically.  Near  equilibrium,  a  polymerase  can  attach 
(detach)  monomers  with  two  different  relative  rates  for 
a  given  F  [Fig.  1(c)].  We  also  tested  Eq.  (6)  by  direct 
Monte  Carlo  simulations  of  Eq.  (5),  where  chains  were 


FIG.  1  (color  online).  Dependence  of  stationary  properties  on 
g.  (a)  v  for  m  =  4  and  77  =  5;  (b)  e  for  m  =  4  and  77  =  5;  (c)  F 
for  m  =  4  and  77  =  5;  (d)  convergence  of  v  in  numerical 
simulations  for  m  =  2,  77  =  10,  y  =  10-3,  and  g  =  0. 1;  and 
(e)  convergence  of  e  in  simulations  for  m  =  4,  77  =  5, 
y  =  10  2,  and  g  =  0.  Size  of  time  step  in  Monte  Carlo  was 
0.01  in  units  of  k~l.  Circles  in  (a)-(c)  are  from  the  numerical 
simulations. 


As  in  equilibrium  thermodynamics,  it  is  reasonable  to  rule 
out  the  segment  of  v  and  e  values  violating  the  stability 
criterion  for  the  susceptibility  x  =  dv/dF  >  0.  When  F  is 
lowered  (e.g.,  from  starvation),  one  reaches  a  spinodal 
where  \  ~ >  00 >  a|'d  v  and  e  discontinuously  jump  to 
smaller  and  larger  values,  respectively.  It  is  expected  that 
the  distributions  of  v  and  e  would  become  unusually  broad 
at  the  critical  point  where  dF/dv  =  d2F/dv2  =  0. 
Figure  3  shows  the  phase  diagram  analogous  to  the  p-T 
diagram  of  fluids,  where  the  stability  limit  of  the  low-error 
phase  terminates  at  the  critical  point  (Fc,  yc),  which  shifts 
to  smaller  values  with  increasing  77. 

How  will  this  behavior  of  a  single  replicator  affect 
the  population  dynamics  of  competing  molecules?  Such  a 
population  exists  as  a  quasispecies,  a  cloud  of  variable 
genotypes  surrounding  a  master  sequence  [9,15-17], 
whose  evolution  is  described  by  dnjdt  =  ’Y^jQijrjnj-> 
where  «,■  is  the  number  of  replicators  with  genotype  i  of 
length  L,  rt  =  vkjL  is  its  replication  rate  (“fitness”)  with 
NTP-incorporation  rate  kh  and  Q{)  =  ed‘j(  1  —  e)L~d‘i  is 
the  mutation  matrix  between  i  and  j  with  Hamming  dis¬ 
tance  djj  (the  number  of  nucleotides  that  are  different).  The 
more  common  form  of  the  quasispecies  dynamics  can  be 
derived  from  v,  =  n  JN,  where  N  =  X,n,  is  the  total 
population  size.  Using  Y.iQij  =  1>  one  can  also  write 
dN/dt  =  rN,  where  r  =  X/L-L  is  the  mean  fitness  of  the 
population.  Equation  (3)  is  generalized  as 


FIG.  2  (color  online),  (a)  Velocity  and  (b)  error  rate  as  func¬ 
tions  of  F  for  m  =  4  and  77  =  5.  The  solid  line  in  (b)  corre¬ 
sponds  to  Eq.  (22).  The  critical  point  is  reached  when 
yc  =  1.83  X  10“3. 
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FIG.  3  (color  online).  Phase  diagram  for  in  =  4  and  77  =  5. 
Thick  (red)  line  denotes  the  spinodal  of  high-u/low-e  phase 
terminating  at  the  critical  point  (Fc,  yc)  =  (0.210,  1.83  X  10-3). 
The  gray  scale  indicates  the  error  rate  e.  Thin  solid  (black)  lines 
show  constant-6  contours  at  e  =  10-4,  10-3,  10-2,  10_1.  Dotted 
(blue)  lines  show  constant-u  contours  at  v  =  0.1,  0.2,  0.3  from 
left  to  right. 


emergence  of  biological  information  from  a  population 
of  random  sequences  may  have  required  crossing  the  ther¬ 
modynamic  threshold  where  dv/dy  changes  sign. 

We  assumed  that  the  template  sequence  is  spatially 
uncorrelated  and  mutations  occur  only  via  substitutions. 
Real  genomic  sequences  show  long  range  correlations 
often  described  by  power  laws,  which  can  be  modeled 
by  considering  insertion,  deletion,  and  duplication  events 
[19-21].  It  will  be  of  interest  to  extend  the  current  study  to 
examine  such  effects. 

Funding  support  came  from  the  Department  of  Defense 
High  Performance  Computing  (HPC)  Modernization 
Program  Office  under  the  HPC  Software  Applications 
Institute  Initiative,  the  U.S.  Army  Medical  Research  and 
Materiel  Command. 


S  =  E^/i,r,L  =  NFLr,  (23) 


which  shows  that  fitness  increases  are  driven  by  the 
entropy  production. 

The  constitutive  relation  r  =  r(F )  on  the  population 
level  can  now  be  obtained  from  the  solution  of  Eigen 
model  in  the  infinite  population  limit  [16].  For  the  simplest 
landscape  kjk  =  5;1(A  —  1)  +  1,  where  i  =  1  denotes  the 
master  sequence  with  relative  fitness  A  >  1,  the  mean 
fitness  becomes 


where 


=  vx\Ae~Le  if  €  <  e*, 
L  1 1  otherwise, 


(24) 


is  the  error  threshold  [9].  For  e  >  e*,  therefore,  the  con¬ 
stitutive  relation  on  the  population  level  remains  the  same 
and  no  nontrivial  collective  effect  occurs.  With  e  below  the 
threshold,  on  the  other  hand,  Lr>  v  and  the  population 
enhances  its  collective  growth  rate  by  maintaining  the 
quasispecies  peaked  near  the  master  sequence.  Equation 
(25)  can  be  combined  with  Eq.  (18)  to  give  boundaries  of 
genomic  stability  in  the  phase  diagram  [Fig.  3]:  e(F,  y)  < 
\nA/L.  If  F  decreases  during  starvation  under  a  constant 
yt<  y  <  yc,  where  y ,  corresponds  to  a  “triple  point,”  the 
error  catastrophe  transition  (“melting”)  will  be  followed 
by  the  thermodynamic  transition  (“evaporation”). 

Can  a  population  with  highly  error-prone  polymerases, 
such  as  rudimentary  ribozymes  in  an  RNA  environment 
[18],  spontaneously  evolve  increasingly  lower  y  values  and 
eventually  enter  the  regime  of  genomic  stability? 
Evolutionary  walks  with  positive  changes  in  r  are  thermo¬ 
dynamically  favorable  for  a  given  F  [Eq.  (23)],  which 
occurs  in  Fig.  3  for  y  S  10-2  where  dv/dy  <  0.  An 
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